AITopics | actor-critic algorithm

On the Convergence of Single-Timescale Actor-Critic

Neural Information Processing SystemsJun-16-2026, 04:31:28 GMT

We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an ϵ-close globally optimal policy with a sample complexity of O(ϵ 3). This significantly improves upon the existing complexity of O(ϵ 2)to achieve ϵ-close stationary policy, which is equivalent to the complexity of O(ϵ 4)to achieve ϵ-close globally optimal policy using gradient domination lemma.

Add feedback

Automatic Data Augmentation for Generalization in Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 06:15:56 GMT

Deep reinforcement learning (RL) agents often fail to generalize beyond their training environments. To alleviate this problem, recent work has proposed the use of data augmentation. However, different tasks tend to benefit from different types of augmentations and selecting the right one typically requires expert knowledge. In this paper, we introduce three approaches for automatically finding an effective augmentation for any RL task. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms. Our method achieves a new state-of-the-art1on the Procgen benchmark and outperforms popular RL algorithms on DeepMind Control tasks with distractors. In addition, our agent learns policies and representations which are more robust to changes in the environment that are irrelevant for solving the task, such as the background.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

0f94c552e5fe82bc152494985e34bd48-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 14:36:31 GMT

demonstration, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Ehsan Imani, Eric Graves, Martha White

Neural Information Processing SystemsFeb-12-2026, 16:32:08 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, emphatic weighting, gradient, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

Neural Information Processing SystemsFeb-11-2026, 19:05:56 GMT

We corroborate our theoretical results with numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

cc9b3c69b56df284846bf2432f1cba90-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 09:58:56 GMT

actor-critic algorithm, algorithm, inequality, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

Neural Information Processing SystemsFeb-10-2026, 09:58:48 GMT

In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.30)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Self-ImitationLearningviaGeneralizedLower BoundQ-learning

Neural Information Processing SystemsFeb-9-2026, 14:54:58 GMT

NaiveIS estimator involves products of the form π(at | xt)/µ(at | xt) and is infeasible in practice due to high variance. To control the variance, a line of prior work has focused on operator-based estimation to avoid fullIS products, which reduces the estimation procedure into repeated iterations of off-policyevaluation operators [1-3].

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

AutomaticDataAugmentationforGeneralizationin ReinforcementLearning

Neural Information Processing SystemsFeb-8-2026, 00:34:01 GMT

Generalization to new environments remains a major challenge in deep reinforcement learning (RL). Current methods fail to generalize to unseen environments even when trained on similar settings [19, 51, 71, 11, 21, 12, 60].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

ImprovingSampleComplexityBoundsfor(Natural) Actor-CriticAlgorithms

Neural Information Processing SystemsFeb-7-2026, 23:15:29 GMT

The goal of reinforcement learning (RL) [39] is to maximize the expected total reward by taking actions according toapolicyinastochastic environment, whichismodelled asaMarkovdecision process (MDP) [4]. To obtain an optimal policy, one popular method is the direct maximization of the expected total reward via gradient ascent, which is referred to as the policy gradient (PG) method [40,47].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: